Integrating pitch and localisation cues at a speech fragment level
نویسندگان
چکیده
This paper proposes a novel speech-fragment based approach for processing binaural data to improve the estimation of speech source locations in reverberant, multi-speaker recordings. The technique employs two stages. First, a robust multipitch tracking algorithm is used to locate local spectro-temporal ‘speech fragments’ – regions where the energy in the mixture is dominated by a single speech source. Second, robust localisation estimates are formed by integrating interaural time difference cues over each speech fragment. The technique is applied to the analysis of more than five hours of two-party meetings that have been constructed from a mixture of binaural mannequin recordings. It is shown that estimating location at the speech fragment level produces better results than conventional location-estimate smoothing techniques leading to a an increase in relative frame accuracy rate of more than 35%.
منابع مشابه
Binaural Cues for Fragment-Based Speech Recognition in Reverberant Multisource Environments
This paper addresses the problem of speech recognition using distant binaural microphones in reverberant multisource noise conditions. Our scheme employs a two stage fragment decoding approach: first spectro-temporal acoustic source fragments are identified using signal level cues, and second, a hypothesisdriven stage simultaneously searches for the most probable speech/background fragment labe...
متن کاملA hearing-inspired approach for distant-microphone speech recognition in the presence of multiple sources
This paper addresses the problem of speech recognition in reverberant multisource noise conditions using distant binaural microphones. Our scheme employs a two-stage fragment decoding approach inspired by Bregman’s account of auditory scene analysis, in which innate primitive grouping ‘rules’ are balanced by the role of learnt schema-driven processes. First, the acoustic mixture is split into l...
متن کاملRecent advances in fragment-based speech recognition in reverberant multisource environments
This paper addresses the problem of speech recognition using distant binaural microphones in reverberant multisource noise conditions. Our scheme employs a two stage fragment decoding approach: first spectro-temporal acoustic source fragments are identified using signal level cues, and second, a hypothesisdriven stage simultaneously searches for the most probable speech/background fragment labe...
متن کاملThe Function of Pitch Range Variations in Samples of Emotional Expressions in Persian
This study aims at investigating the interface between emotion and intonation patterns (more specifically, duration and pitch amplitude of speech). To this end, the acoustic properties of spectral parameters related to speech prosody are investigated. The results of acoustic and Statistical analysis show that mean level and range of FO in the contours vary strongly as a function of the degree o...
متن کاملAcoustic analysis of lexical tone in Mandarin infant-directed speech.
Using Mandarin Chinese, a "tone language" in which the pitch contours of syllables differentiate words, the authors examined the acoustic modifications of infant-directed speech (IDS) at the syllable level to test 2 hypotheses: (a) the overall increase in pitch and intonation contour that occurs in IDS at the phrase level would not distort lexical pitch at the syllable level and (b) IDS provide...
متن کامل